Annotation of prominent words, prosodic boundaries and segmental lengthening by non-expert transcribers in the Spoken Dutch Corpus
نویسندگان
چکیده
This paper first describes the aims of the prosodic annotation for (part of) the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), and the procedures that are currently being developed to produce the annotation. It further reports on a pilot study that was run to estimate the costs and the attainable quality (in terms of inter-transcriber consistency) of the envisaged annotation. It is our claim that high-quality prosodic annotation (of prominence, prosodic breaks, and unusual segmental lengthening) can be obtained by nonexperts, provided these are given a strict, written protocol and a short period of supervision and feedback.
منابع مشابه
Prosody in a corpus of French spontaneous speech: perception, annotation and prosody ~ syntax interaction
Our study focuses on the issue of prosodic annotation and of the prosody ~ syntax interface in conversation and is based on a large corpus of conversational speech in French. The results of inter-transcriber agreement tests show that two expert transcribers are consistent in their labeling of prosodic phrasing and the consistency is well above the chance. A qualitative analysis reveals transcri...
متن کاملProsodic Labelling and Acoustic Data
Data on the labelling of boundaries and prominences in read and spontaneous speech have been collected from ten non-expert and one expert transcribers and analyzed for their inter-subjective variability. The labellings are matched with acoustic data to explore the relevant cues used by the transcribers. INTRODUCTION Most work on prosody relies on some kind of labelling of the prosodic features ...
متن کاملAn Investigation of Prosody in Hindi Narrative Speech
This paper investigates how prosodic elements such as prominences and prosodic boundaries in Hindi are perceived. We approach this using data from three sources: (i) native speakers of Hindi without any linguistic expertise (ii) a linguistically trained expert in Hindi prosody and finally, (iii) classifiers trained on English for automatic prominence and boundary detection. We use speech from a...
متن کاملConsistency Maintenance in Prosodic Labeling for Reliable Prediction of Prosodic Breaks
For the implementation of the prosody prediction model, large scale annotated speech corpora have been widely applied. Reliability among transcribers, however, was too low for successful learning of an automatic prosodic prediction. This paper reveals our observations on performance deterioration of the learning model due to inconsistent tagging of prosodic breaks in the established corpora. Th...
متن کاملWhat are Transcription Errors and Why are They made?
In recent work we compared transcriptions of German spontaneous dialogues of the VERBMOBIL corpus to ascertain differences between transcribers and quality. A better understanding of where and what kind of inconsistencies occur will help us to improve the working environment for transcribers, to reduce the effort on correction passes, and will finally result in better transcription quality. The...
متن کامل